A stable, safe implementation of "Struct Target Features" #108

DJMcNab · 2025-10-14T16:27:47Z

The core contributions of this PR are:

A trait which a (should be zero-sized) struct can implement, which indicates that it is a type-level proof that a set of target features are enabled.
The trampoline macro, which validates a #[target_feature(enable = "xxx")] string against values of one or more of these, ensuring at compile time that a call to a #[target_feature] function will be safe; and then calling it.
A corresponding struct for each target feature on x86[-64], which are code generated.

The state of this feature is:

It is not used for implementing the Fearless SIMD crate.
~~The x86-64-v{1,2,3,4} level implementations do not exist/are extremely incomplete.~~
~~Some docs are missing (these are however not the most critical docs, it's only docs on the groupings of x86 features).~~
It does not have support for aarch64 in the architecture levels. This is not hard, it's just data wrangling.

There is also an open licensing question, around the docs taken from the Rust reference. My preference would be to copy https://github.com/rust-lang/reference/blob/1d930e1d5a27e114b4d22a50b0b6cd3771b92e31/LICENSE-MIT#L1 into our LICENSE-MIT, which avoids having to make a decision about copyright-ability here.

My proposed next steps are:

Discuss this at Renderer Office Hours tomorrow: Done
If we decide this is a direction we want to follow, clean up and land this PR.
- Bump our MSRV to 1.89 to take Tracking Issue for AVX512 intrinsics rust-lang/rust#111137 into account: Done
- Add final docs: Done
- Validate that things are up-to-date in CI: Done
- True x86-64-v{1,2,3,4} level support: Done
Follow-up with:
- aarch64 support
- Automatic selection/an enum of x86-64 levels
- Using it in the implementation of Fearless SIMD itself

For review:

You can mostly ignore the contents of fearless_simd_core/x86/xxx/xxx.rs, as these are entirely automatically generated. The exception is fearless_simd_core/x86/xxx/mod.rs, which are hand-written, but don't have any logic.

Discussed on Zulip: #simd > Removing `safe-wrappers`

fearless_simd_core/src/x86/adx/adx.rs

DJMcNab · 2025-10-14T16:37:02Z

fearless_simd_core/src/lib.rs

This file (and trampoline.rs) contains the main code needed to understand this PR.

DJMcNab · 2025-10-14T16:37:36Z

fearless_simd_core/src/support.rs

+/// See the module level docs [self].
+///
+/// We require static lifetimes as this is primarily internal to the macro.
+pub const fn is_feature_subset<const N: usize>(


This function needs the most careful review, because its correctness is being relied upon for safety.

DJMcNab · 2025-10-15T16:13:58Z

The "glamour shot" of this PR is that given:

fearless_simd/fearless_simd_core/src/lib.rs

Lines 236 to 241 in ac8cb44

    
           #[target_feature(enable = "sse")] 
        
           fn sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4] { 
        
               let a: __m128 = bytemuck::must_cast(a); 
        
               let b: __m128 = bytemuck::must_cast(b); 
        
               bytemuck::must_cast(_mm_mul_ps(a, b)) 
        
           }

You can run:

fearless_simd/fearless_simd_core/src/lib.rs

Lines 246 to 255 in ac8cb44

    
           let Some(sse) = x86::v1::Sse::try_new() else { 
        
               panic!("Example code") 
        
           }; 
        
           let a = [10_f32, 20_f32, 30_f32, 40_f32]; 
        
           let b = [4_f32, 5_f32, 6_f32, 7_f32]; 
        
           // Both of these example expansions, the former using the shorthand form: 
        
           let res = 
        
               trampoline!(Sse = sse => "sse", sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4]); 
        
           assert_eq!(res, [40_f32, 100_f32, 180_f32, 280_f32]);

To entirely safely and soundly use Rust's SIMD intrinsics.

To help guide review, the core contribution of this PR is a way to talk about target features in the type system. This is implemented through this trait:

fearless_simd/fearless_simd_core/src/lib.rs

Lines 24 to 50 in ac8cb44

    
           /// Token that a set of target feature is available. 
        
           /// 
        
           /// Note that this trait is only meaningful when there are values of this type. 
        
           /// That is, to enable the target features in `FEATURES`, you *must* have a value 
        
           /// of this type. 
        
           /// 
        
           /// Values which implement this trait are used in the second argument to [`trampoline!`], 
        
           /// which is a safe abstraction over enabling target features. 
        
           /// 
        
           /// # Safety 
        
           /// 
        
           /// To construct a value of a type implementing this trait, you must have proven that each 
        
           /// target feature in `FEATURES` is available. 
        
           pub unsafe trait TargetFeatureToken: Copy { 
        
               /// The set of target features which the current CPU has, if 
        
               /// you have a value of this type. 
        
               const FEATURES: &[&str]; 
        
               /// Enable the target features in `FEATURES` for a single run of `f`, and run it. 
        
               /// 
        
               /// `f` must be marked `#[inline(always)]` for this to work. 
        
               /// 
        
               /// Note that this does *not* enable the target features on the Rust side (e.g. for calling). 
        
               /// To do so, you should instead use [`trampoline!`] directly - this is a convenience wrapper around `trampoline` 
        
               /// for cases where the dispatch of simd values is handled elsewhere. 
        
               fn vectorize<R>(self, f: impl FnOnce() -> R) -> R; 
        
           }

Implementing TargetFeatureToken indicates that a token represents one or more target feature being enabled. This token can be used in the new trampoline! macro, to safely use one or more tokens to run code in a #[target_feature(enable = "..."))] context. This works by validating the user-provided target feature string, which makes sure that the provided tokens justify executing that function. An example of these being used is:

fearless_simd/fearless_simd_core/src/lib.rs

Lines 249 to 255 in ac8cb44

    
           let a = [10_f32, 20_f32, 30_f32, 40_f32]; 
        
           let b = [4_f32, 5_f32, 6_f32, 7_f32]; 
        
           // Both of these example expansions, the former using the shorthand form: 
        
           let res = 
        
               trampoline!(Sse = sse => "sse", sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4]); 
        
           assert_eq!(res, [40_f32, 100_f32, 180_f32, 280_f32]);

In this example, the SSE x86 functionality for multiplying is proven to be safe, and then executed and ran.
The contents of fearless_simd_core/lib.rs are the core contribution of this PR, plus the infra code in trampoline.rs which makes it work.

Separately, in this PR, we have the functionality for properly using this on the x86_64 (and also plain x86) architectures. This is the contents of the x86 folder. This involves:

A token struct for each target feature which Rust supports, with the trivially correct safety checks for constructing them.
A struct for each of x86-64-v{1,2,3,4}, which are the micro-architecture levels of x86. These levels are in v1/level.rs, etc.

Every file in that folder (except for mod.rs files) is automatically generated by the binary crate of the fearless_simd_core/gen package (after rustfmt is ran). As such, there isn't really any significant logic in those files.

Also removes unused additional impl support

taj-p

Soundness makes sense to me! Let me know if these TODOs are deliberate, once you've solved the question of whether to split out x86 into x86 and x86_64, and I'm happy to approve

fearless_simd_core/src/lib.rs

taj-p · 2025-10-24T00:25:19Z

fearless_simd_core/src/lib.rs

+//! These examples use [bytemuck](https://crates.io/crates/bytemuck) for this.
+//!
+//! <!-- TODO -->


Is this TODO deliberately left for completing in a later PR?

Yeah, that was my approximate intention. I'll plan to follow up with that very soon after we land this, but I don't want to potentially block this entire PR on getting those reviewed now.

taj-p · 2025-10-24T00:25:48Z

fearless_simd_core/src/lib.rs

+//!
+//! # Crate Feature Flags
+//!
+//! <!-- TODO -->


Is this a deliberately left TODO?

Yeah, that's correct. This is something we'd update closer to release time - in particular, the std feature currently does nothing other than existing for forward-compatability, so there aren't actually any meaningful feature flags.

fearless_simd_core/src/lib.rs

fearless_simd_core/src/support.rs

fearless_simd_core/src/lib.rs

ajakubowicz-canva · 2025-10-24T01:55:40Z

fearless_simd_core/src/lib.rs

+/// Note that a function only operating on 128 bytes is probably too small for checking
+/// whether a token exists just for it is worthwhile.


Nit: grammar

I just removed this block, as it's not really all that helpful.

ajakubowicz-canva · 2025-10-24T02:33:05Z

fearless_simd_core/src/lib.rs

+/// /// Perform some computation using SIMD.
+/// #[target_feature(enable = "f1,f2")]
+/// fn uses_simd(val: [f32; 4]) -> [f32; 4] {
+///     // ...
+/// }
+///
+/// let a = [1., 2., 3., 4.];
+/// let Some(token) = token else { return scalar_fallback(a) };
+///
+/// trampoline!(Token = token => "f1,f2", uses_simd(a: [f32; 4]) -> [f32; 4])


I have a minor misunderstanding regarding why the token feature set needs to be declared "f1,f2" here, whilst the feature set is also declared on the uses_simd function?

When would the feature string passed to trampoline and the function target feature strings diverge? On this line what happens if in the trampoline call "f1,f2" is accidentally written as "f1"?

Similarly, I need clarification of the the utility of multiple tokens. E.g. [Token = token, Sse = my_sse] => "f1,f2,sse"?

Similarly, I need clarification of the the utility of multiple tokens. E.g. [Token = token, Sse = my_sse] => "f1,f2,sse"?

Attempt at answering my own question.
The list of tokens provided to trampoline provides an explicit list of permitted/witnessed features. The function passed to trampoline (uses_simd in this example), has required features declared in the target_feature attribute. Thus, trampoline enforces that it is only safe to call this function if the provided tokens contain the subset of required features?

Very very cool.

Yeah, exactly. This is due to our dependency of the target features 1.1 Rust feature, which moves that safety check into the Rust compiler.

And yes, you can use multiple tokens exactly as you describe. I'll see about adding some docs for that.

To put this another way, declaring the target feature string in the macro body allows us to move the target features from just being an attribute, to being both an attribute but also a const value, which we can then perform validation on.

fearless_simd_core/src/lib.rs

ajakubowicz-canva · 2025-10-24T03:08:12Z

fearless_simd_core/src/lib.rs

+//! This abstraction is designed to be combined with target features 1.1, the recent update
+//! in the Rust compiler to allow calling `#[target_feature]` functions safely from within
+//! other `#[target_feature]` functions.
+//! As such, once you have used the [`trampoline!`] macro, you can call any intrinsic in [`core::arch`].


I think I am starting to understand the power of this abstraction. Per Stabilize target_feature_11 #134090, it is unsafe to call a function with target_feature declared unless the caller is in a context with those features. Thus the initial call into the target_feature context is unsafe. This trampoline! provides a safe alternative.

I think the "glamour shot" described in comment #108 (comment) makes perfect sense to me now! I really like this PR. Apologies that it's taking me a while to get through. On Monday I plan to go through the code that's outside fearless_simd_core.

fearless_simd_core/src/support.rs

Rename `trampoline.rs` to `support.rs` The old name conflicted with the name of the macro, leading to it being harder to find the docs of the macro itself. Remove unneeded reference Remove entire note on 128 bytes being too small The point it was making was: - Fairly hard to explain - Not necessarily true Add a few more test cases Co-authored-by: Taj Pereira <[email protected]>

DJMcNab · 2025-10-24T13:00:49Z

Thanks both for the excellent reviews - even thought they clearly raced, all the comments were very helpful for improving things!

AndrewJakubowicz · 2025-10-25T01:50:07Z

I had another thought after reviewing your other PR. I'm wondering if this is something that trampoline can express.

Trampoline is excellent cause you can safely call into SIMD target features from non target features callees. Is trampoline also useful to constrain target features? Imagine that you can use 512bit vectors but there's some particular function that triggers the CPU to downclock. Could you use trampoline to narrow the target features when calling from a callee with more target features into a SIMD function intentionally restricting the features available and avoiding a potential CPU downclock? I don't know how useful this would be in practice but I'm curious.

Great work by the way!

Edit: on further thought I realize this is a gap in my target features understanding.

DJMcNab · 2025-10-27T13:13:57Z

Is trampoline also useful to constrain target features? Imagine that you can use 512bit vectors but there's some particular function that triggers the CPU to downclock. Could you use trampoline to narrow the target features when calling from a callee with more target features into a SIMD function intentionally restricting the features available and avoiding a potential CPU downclock? I don't know how useful this would be in practice but I'm curious.

There's a few things to say here, to hopefully help guide your understanding:

As I understand it, this wouldn't need trampoline at all; you should be able to safely call the narrowed function; this is again due to target features 1.1.
In the current design of fearless_simd, the target features of the caller doesn't impact the codegen (except insofar as the autovectoriser is concerned); instead, it depends on which implementation of Simd you use.
The target_feature attribute does impact codegen if you're using std::simd, but that's nightly only. Incidentally, we could make a Simd implementation on top of std::simd.

DJMcNab commented Oct 14, 2025

View reviewed changes

DJMcNab mentioned this pull request Oct 16, 2025

Handle a non-Fallback baseline #105

Merged

DJMcNab added 9 commits October 16, 2025 10:07

Save the version before running the x86 generator

a5bc966

Remove old v1 items

2de5425

Improve the generator

748c0cb

Minor fixups in the generator

3c90813

Also removes unused additional impl support

Add the generated x86 code

3a020c3

Fixup some docs

4ab42b3

Save some generator changes, including adding x86_v{1,2,3,4}

e0899d8

Fixup the generator and mostly finalize levels

77408b1

Re-run update

3ea4c41

DJMcNab force-pushed the trampoline branch from ac8cb44 to 3ea4c41 Compare October 16, 2025 09:07

DJMcNab added 5 commits October 16, 2025 10:16

Bump MSRV to allow avx512 support

dcaf411

Misc cleanups to get ready to launch

53e0064

Handle sse4a and tbm consistently

05be705

Add the final missing docs

1b2aaf3

Add copyright headers

8039eaa

DJMcNab marked this pull request as ready for review October 16, 2025 14:27

DJMcNab added 2 commits October 16, 2025 15:32

Add a CI check for the new generator

f4b835b

Fixup docs on vectorize

ce04c93

ajakubowicz-canva self-requested a review October 19, 2025 23:42

taj-p self-requested a review October 22, 2025 20:07

taj-p reviewed Oct 24, 2025

View reviewed changes

ajakubowicz-canva reviewed Oct 24, 2025

View reviewed changes

fearless_simd_core/src/support.rs Show resolved Hide resolved

ajakubowicz-canva reviewed Oct 24, 2025

View reviewed changes

fearless_simd_core/src/support.rs Show resolved Hide resolved

DJMcNab force-pushed the trampoline branch from eea16b0 to 1039dcc Compare October 24, 2025 12:48

Clean up the stuff about licensing

4c67a23

		/// Note that a function only operating on 128 bytes is probably too small for checking
		/// whether a token exists just for it is worthwhile.

Uh oh!

A stable, safe implementation of "Struct Target Features" #108

Are you sure you want to change the base?

A stable, safe implementation of "Struct Target Features" #108

Uh oh!

Conversation

DJMcNab commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DJMcNab commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DJMcNab Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajakubowicz-canva Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DJMcNab commented Oct 24, 2025

Uh oh!

AndrewJakubowicz commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DJMcNab commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DJMcNab commented Oct 14, 2025 •

edited

Loading

DJMcNab commented Oct 15, 2025 •

edited

Loading

DJMcNab Oct 24, 2025 •

edited

Loading

ajakubowicz-canva Oct 24, 2025 •

edited

Loading

AndrewJakubowicz commented Oct 25, 2025 •

edited

Loading